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Abstract 

Background: A deficiency in phaseolin and phytohemagglutinin is associated with a near doubling of sulfur amino 
acid content in genetically related lines of common bean {Phoseolus vulgaris), particularly cysteine, elevated by 70%, 
and methionine, elevated by 10%. This mostly takes place at the expense of an abundant non-protein amino acid, 
S-methyl-cysteine. The deficiency in phaseolin and phytohemagglutinin is mainly compensated by increased levels 
of the IIS globulin legumin and residual lectins. Legumin, albumin-2, defensin and albumin-1 were previously 
identified as contributing to the increased sulfur amino acid content in the mutant line, on the basis of similarity 
to proteins from other legumes. 

Results: Profiling of free amino acid in developing seeds of the BAT93 reference genotype revealed a biphasic 
accumulation of gamma-glutamyl-5-methyl-cysteine, the main soluble form of S-methyl-cysteine, with a lag phase 
occurring during storage protein accumulation. A collection of 30,147 expressed sequence tags (ESTs) was 
generated from four developmental stages, corresponding to distinct phases of gamma-glutamyl-S-methyl-cysteine 
accumulation, and covering the transitions to reserve accumulation and dessication. Analysis of gene ontology 
categories indicated the occurrence of multiple sulfur metabolic pathways, including all enzymatic activities 
responsible for sulfate assimilation, de novo cysteine and methionine biosynthesis. Integration of genomic and 
proteomic data enabled the identification and isolation of cDNAs coding for legumin, albumin-2, defensin D1 and 
albumin-1 A and -B induced in the absence of phaseolin and phytohemagglutinin. Their deduced amino acid 
sequences have a higher content of cysteine than methionine, providing an explanation for the preferential 
increase of cysteine in the mutant line. 

Conclusion: The EST collection provides a foundation to further investigate sulfur metabolism and the differential 
accumulation of sulfur amino acids in seed of common bean. Identification of sulfur-rich proteins whose levels are 
elevated in seed lacking phaseolin and phytohemagglutinin and sulfur metabolic genes may assist the 
improvement of protein quality. 



Background 

Common bean (Phaseolus vulgaris) is the most important 
leguminous food crop grown for dry seed worldwide, 
both in acreage and yield. Historically, this species has 
been an important model for the study of seed storage 
proteins [1]. In commercial cultivars, the 7S globulin 



* Correspondence: Frederic.Marsolais@agr.gc.ca 

Agriculture and Agri-Food Canada, Southern Crop Protection and Food 
Research Centre, 1391 Sandford St., London, ON N5V 4T3 Canada 
Full list of author information is available at the end of the article 

(3 BioMed Central 



phaseolin constitutes approximately half of total seed 
protein. Lectins are also abundant, with phytohemagglu- 
tinins and a-amylase inhibitors accounting for 10% and 
5% of seed protein, respectively. Like in other grain 
legumes, the content of essential sulfur amino acids is 
sub-optimal for nutrition. A strategy proposed to 
improve protein quality and bioavailability of sulfur 
amino acids consists in the selection and breeding of 
highly-digestible phaseolin types [2]. A different approach 
may rely on variation in storage protein composition. 
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Osborn et al developed genetically related lines inte- 
grating mutations conferring a deficiency in phaseolin 
and major lectins, which are encoded by unique loci [3]. 
The arcelin-phytohemagglutinin-a-amylase inhibitor 
(APA) locus was introgressed from G12882, a wild 
accession containing arcelin- 1, into the commercial cul- 
tivar Sanilac (white navy bean), to generate the SARC1 
line. Recessive mutations from Phaseolus coccineus and 
'Great Northern 1140' were introgressed into the 
SARC1 background, conferring a deficiency in phaseolin 
and lectins, respectively. SMARC1-PN1 lacks phaseolin 
and SMARC1N-PN1 lacks phaseolin, phytohemaggluti- 
nin and arcelin. SARC1, SMARC1-PN1 and SMARC1N- 
PN1 share a similar level (ca. 85%) of the recurrent par- 
ental Sanilac background. Introgression of the APA 
locus containing arcelin- 1 from wild P. vulgaris is asso- 
ciated with resistance to major storage pests, the weevils 
Zabrotes subfasciatus and Acanthoscelides obtectus [4-6]. 
However, in the absence of detailed molecular informa- 
tion about the APA locus, the identity of the lectin(s) 
conferring this resistance remains elusive [7,8]. 

The deficiency in phaseolin and major lectins, phyto- 
hemagglutinin and arcelin, results in a nearly two-fold 
increase in sulfur amino acid content in seed, particu- 
larly of Cys, elevated by 70%, and Met, by 10% [9]. This 
takes place mostly at the expense of 5-methyl-Cys, an 
abundant non-protein amino acid which cannot substi- 
tute for Met or Cys in the diet [10]. Proteomic analysis 
revealed that the lack of phaseolin and major lectins was 
mainly compensated by increases in the 11S globulin 
legumin and residual lectins, particularly the P subunit 
of a-amylase inhibitor-1, a-amylase inhibitor-like pro- 
tein, mannose lectin FRIL, and leucoagglutinating phyto- 
hemagglutinin encoded by PDLEC2 [11]. Several 
proteins contributing to the increased Cys content 
including legumin, albumin-2, defensin and albumin- 1 
were identified on the basis of similarity to related pro- 
teins from other legumes. Based on quantification by 2- 
D gel electrophoresis, legumin levels were raised by 3- 
fold, to 17% of total protein, while albumin-2 was ele- 
vated by 10-fold, to 1.2% of total protein. Defensin and 
albumin- 1 could not be quantified accurately as they 
were characterized from selective extracts. 

A significant number of expressed sequence tags 
(ESTs) have been generated from root, leaf and pod of 
common bean [12-17], but a similar resource is lacking 
for the seed, despite its nutritional importance. In addi- 
tion, transcript profiling studies of common bean using 
high density arrays have so far relied on cross-specific 
hybridization to soybean platforms [18,19], in part due 
to a lack of relevant sequence information. The objective 
of this study was to generate a common bean EST col- 
lection which would provide a foundation to further 
investigate the metabolism of sulfur amino acids in 



developing seeds, by using a functional genomic 
approach. Four seed developmental stages were sampled 
in the reference genotype BAT93 [20], corresponding to 
distinct stages of accumulation of y-Glu-«S-methyl-Cys, 
the major soluble form of S-methyl-Cys. In silico analy- 
sis revealed the occurrence of several pathways of sulfur 
metabolism ranging from sulfate assimilation to (homo) 
glutathione biosynthesis. Analysis of EST clusters pro- 
vided detailed information on the identity and abun- 
dance of storage protein transcripts. Integration with 
proteomic data enabled isolation of cDNAs coding for 
legumin, albumin-2, defensin and albumin- 1 isoforms 
which are elevated in the absence of phaseolin and 
major lectins. These proteins have a higher Cys than 
Met content, providing an explanation for the preferen- 
tial increase of Cys in SMARC1N-PN1. 

Results 

Selection of seed developmental stages and generation 
of ESTs 

Free amino acids were profiled at eight developmental 
stages of BAT93 seed, to identify stages suitable to gener- 
ate ESTs (see Additional File 1). Developmental stages 
are designated after Walbot et al. [21]. According to this 
nomenclature, storage protein transcripts reach their 
maximal levels between stages IV (cotyledon) to VI 
(maturation), while storage products are accumulated 
between stages V (cotyledon) to VII (maturation) [22]. 
During stage VIII (maturation) the seed undergoes desic- 
cation. Amino acid content was normalized over the 
average of developmental stages, expressed in log 2 scale, 
and /c-means analysis performed to reveal common devel- 
opmental patterns. Amino acids were grouped into five 
clusters (Figure 1). The levels of most free amino acids 
decreased markedly between stages V to VII during 
which protein reserves are actively accumulated. The 
nitrogen-rich amino acids, Arg, His and Lys formed a 
cluster with Phe. Amino acids in this cluster were charac- 
terized by a marked rise in their levels during the transi- 
tion to seed desiccation and dormancy. The second and 
third clusters contained central intermediates in amino 
acid metabolism, Ser, Glu, Ala, Gin and Asn, along with 
Leu and Met. Their levels generally decreased throughout 
seed development. This decline was more pronounced 
for amino acids in the third cluster which included Met. 
y-Glu-S-methyl-Cys and y-Glu-Leu formed a unique 
cluster as their levels rose throughout seed development. 
y-Glu-S-methyl-Cys showed a biphasic curve of accumu- 
lation, with a continuous rise until stage V, followed by a 
plateau until stage VII, and a resumption of accumulation 
from stage VIII (see Additional file 1, Figure 1). In com- 
parison, the levels of free <S-methyl-Cys were high at 
stages III and IV, and decreased at stages V and VI, in 
parallel with Met levels. 
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Figure 1 Cluster analysis of free amino acid profiles during 
seed development. Amino acids were classified using /c-means 
analysis into five groups. The names of amino acids belonging to 
each group are indicated on the right. Data on the y axis is the 
amino acid content (see Additional file 1) normalized to the average 
and expressed in log 2 scale. Developmental stages are named 
according to Walbot et al. [21]. Timing of phaseolin {Phs) and 
phytohemagglutinin {Pha) transcript and phaseolin protein 
accumulation is according to Bobb et al. [22]. 



Four developmental stages were selected to generate 
ESTs, corresponding to distinct phases of y-Glu-S- 
methyl-Cys accumulation: stage IV - cotyledon [14 days 
after fertilization (DAF), 29 mg seed weight] and stage 



V - cotyledon (16 DAF, 48 mg seed weight), coinciding 
with early Y-Glu-S-methyl-Cys accumulation, stage VI - 
maturation (21 DAF, 164 mg seed weight), coinciding 
with the lag phase, and stage VIII - maturation (30 
DAF, 380 mg seed weight), coinciding with the resump- 
tion of accumulation. A total of 8,725, 9,537, 5,260 and 
6,625 ESTs were generated for each respective develop- 
mental stage. Of these, Gene Ontology (GO) annotation 
from Arabidopsis [23] was retrieved for 5,511, 3,825, 
1,561 and 3,303 ESTs, respectively. Assembly of the 
total 30,147 ESTs yielded 3,658 contigs and 6,027 
singletons. 

In silico analysis of ESTs identifies features of sulfur and 
amino acid metabolism in developing seeds 

The representation of general GO categories of biological 
process and molecular function was analyzed during seed 
development. The percentage of ESTs assigned to a GO 
category (see Additional file 2) was normalized to the 
average over seed development, expressed in log 2 scale 
and submitted to hierarchical clustering analysis (Figure 
2). Categories of cellular metabolism, primary metabo- 
lism, macromolecule metabolism and cellular protein 
metabolism clustered together. Their representation sug- 
gested that general metabolic activity was highest at stage 
IV, progressively decreased until stage VI and then 
increased at stage VIII. The same general trend was 
observed for the categories of amino acid and sulfur 
compound metabolic processes. The decline in metabolic 
activity coincided with accumulation of storage products. 
Indeed, the nutrient reservoir activity category was most 
highly represented at stage VI, followed by stage V. The 
latter clustered away from all other categories. Hormone 
biosynthesic process also had a unique profile, being 
highest at the first stage of development and markedly 
down at the last two stages. The percentage of ESTs 
assigned to the category of photosynthesis was highest at 
stage IV, followed by stage V, and further decreased at 
stages VI and VIII, consistent with the loss of chlorophyll 
pigmentation in cotyledons during seed maturation [21]. 
Interestingly, categories of photosynthesis, response to 
oxidative stress and response to radiation were grouped 
together. The category of response to water deprivation 
had highest levels at stage VIII, corresponding to seed 
desiccation. Analysis of GO categories related to the sup- 
ply of nitrogen to developing seeds indicated that ureide 
catabolism, comprising allantoinase and allantoicase 
activities, was most highly represented at stage V, coin- 
ciding with the onset of storage protein accumulation, 
whereas the percentage of ESTs associated with asparagi- 
nase activity increased steadily until stage VI, the mid- 
point of storage protein accumulation (Figure 3A). 
Five different GO categories were associated with Arg 
metabolism (Figure 3B). Of these, argininosuccinate 
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Figure 2 Cluster analysis of general GO category profiles during seed development. GO categories were classified by hierarchical clustering 
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Figure 3 Representation of GO categories related to nitrogen supply (A) and Arg metabolic pathway (B) during seed development. 

Percentage of ESTs is displayed for each developmental stage. Abbreviations and GO category numbers are as follows: ASPG: asparaginase 
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lyase activity, the last step in Arg biosynthesis, was 
represented by a high percentage of ESTs at stage IV, of 
0.80%. This correlates with an increase in steady-stage 
Arg levels between stages III to V (see Additional File 
1). Analysis of categories related to sulfur metabolism 
indicated that all enzymatic activities associated with 
sulfur assimilation, de novo Cys and Met biosynthesis 
were represented in the EST dataset (Figure 4). These 
include sulfate adenylyltransferase, adenylyl-sulfate 
reductase and sulfite reductase; Ser O-acetyltransferase 
and Cys synthase; and cystathionine y-synthase, 
cystathionine p -lyase and Met synthase. Homocysteine 
S-methyltransferase, involved in the production of Met 
via the S-methylmethionine cycle, the catabolic enzyme 
Met y-lyase and the two enzymes involved in (homo) 
glutathione biosynthesis, Glu-Cys ligase and (homo)glu- 
tathione synthase, were also represented. 

Analysis of seed protein ESTs 

Clustering of ESTs is an efficient tool to estimate the 
identity and abundance of transcripts in a complex 
mRNA sample. This analysis focused on the sulfur-poor 
7S globulin phaseolin and lectins, the most abundant 
seed proteins in commercial cultivars [1], and on the 
sulfur-rich 11S globulins, 2S albumins, and defensins 
whose levels are elevated in the absence of phaseolin 
and phytohemagglutinin [11]. To identify contigs coding 
for sulfur-rich proteins induced in mature seed of 
SMARC1N-PN1, conceptual translations of sequences 
were compared with de novo sequenced peptides from 
previously obtained liquid chromatography-tandem mass 
spectrometry (LC-MS/MS) data [11]. Corresponding 
cDNAs were isolated. 
Sulfur-poor 7S globulin phaseolin and lectins 
Within this subset, most ESTs were derived from the 
genes encoding the a- and P-subunit of S-type phaseolin, 
which differ by the presence of a 27 base pair insertion, 
and are present in cultivated varieties representative of 
the Mesoamerican gene pool [24,25] (Figure 5). The lec- 
tin encoded by Iec4-B17 and phytohemagglutinin 
encoded by pha-E [26] were the next most highly repre- 
sented, followed by a-amylase inhibitor-1 [27] and a- 
amylase inhibitor-like protein [28]. Fewer ESTs were 
observed for the leucoagglutinating phytohemagglutinin 
encoded by PDLEC2 [29], and an additional phytohemag- 
glutinin previously isolated from an arcelin-5 genotype 
[30]. The phytohemagglutinin encoded by pha-L [26] and 
mannose lectin FRIL [31], previously identified in SARC1 
and SMARC1N-PN1 [11], were not represented in the 
BAT93 EST dataset. Conversely, the arcelin-5 phytohe- 
magglutinin was not previously identified in SARC1 or 
SMARC1N-PN1 [11]. There was no evidence for addi- 
tional phaseolin or lectin transcripts in BAT93. The phy- 
tohemagglutinins identified here are encoded by distinct 



genes, as they share only between 80 to 87% identity in 

their deduced amino acid sequence. 

Sulfur-rich IIS globulins, 2S albumins and defensins 

Among sulfur-rich proteins, clustering of ESTs yielded 
three legumin, three albumin-2, four defensin and ele- 
ven albumin- 1 contigs or singletons. Within each pro- 
tein type, the isoform induced in SMARC1N-PN1 was 
encoded by the most highly represented contig in the 
BAT93 EST collection (Figure 5). 

The major legumin cDNA encodes a protein of 606 
amino acids (Figure 6). Removal of the signal peptide, 
spanning residues 1 to 23, generates a precursor with a 
predicted molecular weight of 66.3 kD. Cleavage after 
the conserved Asn 427 residue gives rise to an acidic a- 
subunit, with a predicted molecular weight of 46.4 kD 
and pi of 5.45, and a P-subunit, with a predicted mole- 
cular weight of 19.8 kD and pi of 7.03. The a-subunit 
contains a Glu-rich domain, spanning positions 260 to 
415. Conventional N-terminal and de novo peptide 
sequencing support the location of the cleavage sites 
[11,32]. De novo sequenced tryptic peptides from repli- 
cate samples of 2-D gel spots [11] covered 48 and 79% 
of the deduced sequences of the a- and P-subunits, 
respectively (Figure 6 and see Additional file 3A). The P. 
vulgaris legumin amino acid sequence shares 54 and 
41% identity with glycinin A5A4B3 from Glycine max 
and arachin 5 from Arachis hypogaea [33], respectively, 
but is more similar in length to arachin 5. Phylogenetic 
analysis indicated that the P. vulgaris protein belongs to 
a group of 11S globulins from legumes, comprising B- 
type, Met-poor legumins from Vicia species [34] (Figure 
7). The P. vulgaris legumin is part of a subgroup of high 
molecular weight legumins which includes group-2 gly- 
cinins from soybean [35], arachin 5, legumin storage 
proteins 2 and -3 from Lotus japonicus [36], the minor 
small legumin from Pisum sativum [37] and legumin- 
related high molecular weight polypeptide from Vicia 
faba [38]. Members of this subgroup are characterized 
by an extended C-terminal half of the a-subunit arising 
by expansion and mutation of a sequence repeat. 
The albumin-2 cDNA encodes a protein of 277 amino 
acids with 80% identity to a seed albumin from Vigna 
radiata and 49% identity to albumin-2 from P. sativum 
[39] (Figure 8A). The protein contains four hemopexin- 
like repeats implicated in polyamine binding [40]. De 
novo sequenced tryptic peptides from replicate samples 
of 2-D gel spots covered 74% of the deduced sequence 
of the mature polypeptide (Figure 8A and see Additional 
file 3B) [11]. The de novo sequencing data indicated that 
the N-terminal Met residue is absent in the mature 
polypeptide. 

The defensin highly expressed in SMARC1N-PN1 had 
been previously identified on the basis of its similarity 
to a defensin from Cicer arietinum [11]. However, 
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Figure 4 Representation of GO categories related to sulfur amino acid metabolic pathways during seed development. Percentage of 
ESTs is displayed for each developmental stage. Abbreviations and GO category numbers are as follows: APS: sulfate adenylyltransferase activity - 
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Figure 5 Transcript expression of seed proteins in the BAT93 
line. Phaseolin and lectin contigs were annotated on the basis of 
nearest BLASTx hit in the UniProt database, with e-value ranging 
between 0 to 1.00E-1 15. For sulfur-rich proteins, only full-length 
contigs are represented. Contigs coding for sulfur-rich proteins 
elevated in SMARC1 N-PN1, lacking phaseolin, phytohemagglutinin 
and arcelin, are marked with an asterisk. Phs, phaseolin; Lec, lectin; 
Pha, phytohemagglutinin; PhaL, leucoagglutinating 
phytohemagglutinin; Al, amylase inhibitor; AIL, amylase inhibitor-like 
protein. 



associated tryptic peptide sequences actually belong to 
antifungal defensin Dl from P. vulgaris [41], whose par- 
tial cDNA and N-terminal amino acid sequences were 
absent from databases. The defensin Dl cDNA isolated 
here encodes a precursor of 75 residues with 93% iden- 
tity to defensin D2 from V. radiata [42]. The mature 
peptide is characterized by a Cys-stabilized ap motif, 
consisting of three P -strands and one a-helix held by 
four disulfide bridges. The two tryptic peptides pre- 
viously identified [11], spanning positions 30 to 39 and 
40 to 53, are the only ones within range of detection by 
the mass spectrometer. The second most abundant con- 
tig, and associated defensin D2 cDNA, encode a precur- 
sor polypeptide with 93% identity to a defensin from 
Vigna unguiculata. Defensins Dl and -2 are 37% identi- 
cal in amino acid sequence. 

Two albumin-1 cDNAs, albumin-lA and -B may 
encode the methanol-soluble albumin-1 highly 
expressed in SMARC1N-PN1. Their deduced amino 
acid sequences are 97% identical. They encode polypep- 
tide precursors of 127 amino acids with 88% identity to 
a partial albumin-1 from P. vulgaris [43], and 71% iden- 
tity to albumin-1 from G. max [44] (Figure 8B). The 



precursors give rise to chains b (residues 28 to 64) and 
a (residues 73 to 124) [45]. The b chain is arranged in a 
knottin-fold containing three P -strands held by three 
disulfide bridges [46,47]. The peptide previously de novo 
sequenced spans residues 49 to 63 of the b chain [11]. 
The presence of an Arg at position 63 generates a tryp- 
tic site unique to albumin-1 A and -B, among the albu- 
min-1 isoforms represented in the EST dataset. 

Availability of cloned cDNAs provides information 
about the sulfur amino acid content of sulfur-rich pro- 
teins whose levels are elevated in SMARC1N-PN1. The 
mature legumin subunits contain 0.9% of their residues 
as Cys and 0.7% as Met. Mature albumin-2 contains 
1.3% of Cys and 0.4% of Met. Mature defensin Dl is 
particularly rich in Cys at 17% and lacks Met residues. 
The mature albumin-lA and -B subunits have a Cys 
content of 11%, and a Met content of 3.3 and 2.2%, 
respectively. 

Purification of legumin 

To further characterize legumin, this protein was purified 
from SMARC1N-PN1 by targeting the most abundant 2- 
D gel spots, 78 to 80 (59-64 kD and pi value of 5.5-5.6), 
corresponding to its a-subunit and representing 10% of 
total protein [11]. The apparent pi value was used to 
select conditions for purification by ion exchange chro- 
matography. Fractions were screened by SDS-PAGE for 
bands having the appropriate molecular weight. Ammo- 
nium sulfate precipitation followed by ion exchange 
chromatography yielded two fractions of interest, charac- 
terized by size exclusion chromatography. Proteins were 
identified after in-gel trypsin digestion and LC-MS. The 
first fraction contained group 3 late embryogenesis abun- 
dant protein with an apparent molecular weight of 56.0 
kD (Figure 9A; see Additional file 3C). In 2-D gels, this 
protein migrated to spots 84 (64 kD and pi value of 5.4) 
and 104 (55 kD and pi value 5.4-5.5), having apparent 
molecular weight and pi values close to those of the legu- 
min a-subunit. The native molecular weight of group 3 
late embryogenesis abundant protein was measured as 
494 ± 2 kD (average ± standard deviation; n = 3), sug- 
gesting a multimer of nine identical subunits. The second 
fraction contained the legumin polypeptide precursor, Gl- 
and P -subunits, with apparent molecular weights of 73.4, 
54.8 and 21.9 kD respectively (Figure 9C; see Additional 
file 3D-F). Two minor bands, of 44.0 and 42.7 kD were 
also identified as the a-subunit (see Additional file 3G). 
Detection of the mass of a tryptic peptide, 423- 
NRGSNGIEETLCTLK-437, containing an intact cleavage 
site was characteristic of the precursor. A complex pat- 
tern of elution was observed after size exclusion chroma- 
tography (Figure 9B and 9C). A first peak contained all 
three major polypeptides, with a molecular weight of 704 
± 36 kD, suggesting possible associations between trimers 
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Figure 6 Alignment of deduced amino acid sequences of P. vulgaris legumin and closely related IIS globulins. Vertical arrows mark 
cleavage sites after the signal peptide and between the a- and (3-subunits. Horizontal bars indicate de novo sequenced tryptic peptides from 2- 
D gel spots (see Additional file 3A) [1 1]. Species codes are as follows: Pv, P. vulgaris; Gm, G. max; and Ah, A hypogaea. 



of the precursor and hexamers of a- and P-subunit. A 
second peak contained the a-subunit alone, with a mole- 
cular weight of 282 ± 2 kD, suggesting a hexamer. Recov- 
ery of the a-subunit alone may be related to the 
inclusion of reducing agent during purification. 

Discussion 

Profiling of free amino acids in BAT93 seeds enabled a 
rational selection of developmental stages to generate 
ESTs, which correspond to distinct phases of y-Glu-S- 
methyl-Cys accumulation. Analysis of general GO 



categories related to primary metabolism, photosynth- 
esis, response to water deficit and nutrient reservoir 
activity validated the representativeness of ESTs at the 
different stages. Free amino acid and GO category pro- 
files confirm and extend the findings of Fait et al. [48] 
on the metabolic transitions to storage product accumu- 
lation and desiccation in Arabidopsis seeds. They high- 
lighted a decrease in the levels of free amino acids, 
particularly Asn, Gin and Lys, during reserve accumula- 
tion, indicative of metabolic transformations and effi- 
cient incorporation into storage proteins. Nitrogen-rich 
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amino acids, Asn, Lys and Arg, were transiently elevated 
during the transition to seed desiccation, suggesting 
their role as a nitrogen source to support germination 
prior to the mobilization of storage protein. The same 
was true of aromatic amino acids, Trp, Phe and Tyr, 
which may support the biosynthesis of shikimate-derived 
compounds for defense and indole-acetic acid during 
germination. Steady-state transcript levels of most 



enzymes of primary metabolism examined had reduced 
levels during the active phase of reserve accumulation. 

The biphasic accumulation of y-Glu-«S-methyl-Cys is 
consistent with a function of «S- methyl- Cys as a storage 
form of excess sulfur. During the lag phase, incorpora- 
tion of sulfur amino acids into storage proteins may effi- 
ciently compete with the biosynthesis of S- methyl- Cys. 
Based on the high levels of free <S-methyl-Cys detected 
at stages III and IV, substantial flux through this meta- 
bolite may lead to accumulation in the y-glutamyl 
dipeptide form. The rise in Arg levels between stages III 
to V suggests a transient accumulation of nitrogen into 
this nitrogen-rich amino acid in anticipation of active 
storage protein accumulation. This increase may be 
mediated by arginosuccinate lyase activity, whose tran- 
script is overrepresented during stage IV. 

Analysis of GO categories related to sulfur metabolism 
revealed the occurrence of complete pathways of sulfate 
assimilation, de novo Cys and Met biosynthesis. Similar 
findings have been reported for soybean seed [49]. These 
data may be interpreted in relation with current under- 
standing of the sulfur nutrition of legume seed. The main 
sources of sulfur transported to the seed are expected to 
be S-methylmethionine [50] and homoglutathione [51]. 
In soybean, most sulfate appears to be converted to 
homoglutathione in the pod, prior to uptake by develop- 
ing cotyledons, although some is detected in developing 
seed [52] . However, under sulfur-limiting conditions, glu- 
tathione exclusively, and no sulfate, is translocated to the 
seed, according to a model developed in wheat [51]. 
Interestingly, Arabidopsis knock-out mutants of sulfate 
transporters that have been characterized so far show 
increased sulfate content in mature seed, suggesting a 
function in intracellular transport rather than uptake by 
the embryo [53,54], Beside the generation of sulfide for 
de novo Cys biosynthesis, the sulfate assimilatory pathway 
is required for the biosynthesis of the activated nucleo- 
tide S'-phospho-S'-adenosinephosphosulfate, the univer- 
sal donor in sulfate transfer reactions. Adenylyl-sulfate 
kinase activity, forming S'-phospho-S'-adenosinepho- 
sphosulfate, which competes with adenylyl-sulfate reduc- 
tase for its substrate, was represented in the EST dataset 
(Figure 4). Cystathionine y-synthase and -P -lyase likely 
provide homocysteine as an acceptor for methyl transfer 
from S-methylmethionine, catalyzed by homocysteine S- 
methyltransferase [55], while Met synthase is essential to 
recycle homocysteine produced in the S-adenosylmethio- 
nine cycle, which appeared highly active according to the 
high representation of the Met adenosyltransferase activ- 
ity category. Further research is required to fully under- 
stand the significance of the sulfate assimilatory pathway, 
and the relative contributions of de novo Cys and Met 
biosynthesis in seed metabolism. 
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Figure 8 Alignment of deduced amino acid sequences of P. vulgaris albumin-2 (A) and albumin-1 (B) with related proteins. Vertical 
arrows mark cleavage sites of polypeptide precursors. Horizontal bars indicate de novo sequenced tryptic peptides from 2-D gel spots and SDS- 
PAGE bands from methanol soluble extracts (see Additional file 3B) [1 1]. Species codes are as in Figures 6 and 7 and also includes: Vr, V. rodioto. 



Previous results have shown that the absence of Cys- 
poor phaseolin and phytohemagglutinin in SMARC1N- 
PN1 leads to a shift of sulfur from S-methyl-Cys to the 
protein Cys pool [9]. By integrating proteomic and EST 
data, the deduced sequences of several proteins contri- 
buting to the increased levels of sulfur amino acids in 
SMARC1N-PN1 were identified. These proteins have a 
higher Cys than Met content, providing an explanation 
for the preferential increase in Cys over Met in the 
mutant line. 

Legumin is found at relatively low levels of ca. 3% of 
seed protein in common bean cultivars, as compared 
with other grain legumes [32]. Identification of its 



deduced amino acid sequence establishes that it is a 
member of the high molecular weight 11S globulins. 
The mechanism leading to the evolution of high mole- 
cular weight 11S globulins involves expansion of a 
repeat sequence at the C-terminal end of the a-subunit. 
The sequences of these repeats differ between P. vul- 
garis legumin, arachin 5 and glycinin A5A4B3 (Figure 
6). Legumin has four instances of the sequence NH 2 - 
HKEEEKEVEPLP-COOH, compared with four of the 
sequence NH 2 -GYDDD[E/D]RRP-COOH and eight of 
the sequence NH 2 -DDD[E/D]RR-COOH in arachin 5. 
Glycinin A5A4B3 has three repeats of the sequence 
NH 2 -QDEDEDEDED-COOH, compared with two in 
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glycinin A4B3 [56]. The first instance of this sequence, 
spanning positions 258 to 268 of legumin, is relatively 
well conserved but not repeated in legumin and arachin 
5. These observations suggest that expansion of repeats 
probably occurred, at least in part, after the separation 
of the lineages leading to the three species. 

Purification of legumin revealed substantial accumula- 
tion of the propolypeptide in SMARC1N-PN1. A similar 
finding has been reported in transgenic soybean where 
the expression of the a-and a'-subunits of the 7S globu- 
lin a-conglycinin was suppressed [57]. Interestingly, the 
proglycinin was shown to be trafficked directly from the 
ER to the vacuole, whereas mature 11S globulins transit 
in the Golgi apparatus. In most legume crops, a-subu- 
nits of 11S globulins have an apparent molecular weight 
of approximately 40 kD. By contrast, P. vulgaris and V. 
unguiculata [58] share a single, dominant high molecu- 
lar weight legumin whose a-subunit has an apparent 
molecular weight of approximately 60 kD. Similarly, in 
L. japonicus, two of three major legumins, the legumin 
storage proteins 2 and -3, which are closely related to 
the P. vulgaris legumin, have a-subunit apparent mole- 
cular weights of 55 and 60 kD, respectively [36]. In G. 
max, an additional cleavage site in proglycinin gives rise 
to A5 and A4 subunits, having a reduced molecular 
weight [56]. The significance of the 10 kD difference 
between the predicted and apparent molecular weight of 
the major legumin a-subunit, and the detection of 
minor a-subunit polypeptides of 44.0 and 42.6 kD is 
unclear. The minor polypeptides may represent partial 
degradation products, and the discrepancy in molecular 
weight may be due to an effect of the Glu-rich domain 
on electrophoretic behavior. Alternatively, these differ- 
ences may arise by a possible post-translational modifi- 
cation of the mature a-subunit. The absence of this 
modification in prolegumin, whose predicted and appar- 
ent molecular weight is similar, would reflect differences 
in trafficking, as already noted in soybean. Among high 
molecular weight legumins in other legume crops, the 
a-subunit of P. sativum minor small legumin accumu- 
lates only as degradation products ranging from 21 to 
32 kD, which was interpreted as an outcome of early 
mobilization [59], while only the (3-subunit of arachin 5 
was identified in the mature seed proteome of A. hypo- 
gaea [60]. 

Although phaseolin and lectins have been character- 
ized in great detail, the number and identity of genes 
encoding these proteins is still unknown. In the present 
study, there was evidence for only two phaseolin genes 
in the BAT93 EST dataset. Beside the APA locus on 
linkage group B4, two lectin loci have been mapped on 
linkage group B7 [61,62]. EST, proteomic [11], cDNA 
[26] and genomic sequence data [62] can be integrated 
to provide information on the composition of the APA 



locus in different P. vulgaris genotypes. In BAT93 and 
ARCS genotypes, a gene order consisting in Iec4-B17l 
/?/z<2-£/phytohemagglutinin/a -amylase inhibitor-like pro- 
tein can be inferred. In ARC1, arc3 and arc4 genotypes, 
the arcelin-5 phytohemagglutinin appears to be substi- 
tuted with pha-L. PDLEC2 encoding leucoagglutinating 
phytohemagglutinin, isolated from a phytohemaggluti- 
nin-deficient Pinto cultivar [29], whose expression is ele- 
vated in SMARC1N-PN1, may be one of the lectin 
genes located outside of the APA locus. 

Conclusions 

The BAT93 EST collection provides a foundation to 
initiate further studies of sulfur amino acid metabolism 
in developing seed, as all relevant pathways and enzyme 
activities are represented. The results presented here are 
consistent with a mechanism whereby sulfur can be effi- 
ciently partitioned between S-methyl-Cys and Cys, but 
its metabolic basis is not understood. The identification 
and characterization of sulfur-rich proteins whose levels 
are increased in the absence of phaseolin and major lec- 
tins provides an explanation for the preferential increase 
in Cys over Met. 

Methods 

Plant material and growth conditions 

BAT93, a line representative of the Mesoamerican gene 
pool of common bean (Phaseolus vulgaris) [20], was 
grown as previously described [63]. Developing seed 
samples were harvested randomly from about twenty 
plants. SMARC1N-PN1 [3] was grown in the field in 
London, ON, in 2008. 

Extraction and quantification of free amino acids 

Amino acids were extracted and quantified by HPLC 
after derivatization with phenylisothiocyanate as pre- 
viously described [9], except that extraction was per- 
formed in ethanohwater (70:30), which is optimal for 
sulfur containing y-glutamyl dipeptides [64]. Replicate 
samples consisted in independent pools of eight seeds 
which were ground in liquid nitrogen, and 100 mg of 
ground tissue was used for extraction. y-Glu-S-methyl- 
Cys and y-Glu-Leu standards were from Bachem Ameri- 
cas (Torrance, CA). 

RNA extraction 

RNA was extracted using a modified lithium chloride 
precipitation method [63]. RNA was quantified by spec- 
trophotometry on a NanoDrop 2000 (Thermo Scientific, 
Wilmington, DE) and its quality evaluated from A 2 6o/28o 
ratio and agarose gel electrophoresis. Poly A + RNA was 
isolated using the Ambion Poly( A) Purist mRNA purifi- 
cation kit (Applied Biosystems, Streetsville, ON). Its 
quality was analyzed with a 2100 Bioanalyzer (Agilent 
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Technologies, Mississauga, ON) at the London Regional 
Genomics Centre, ON. 

cDNA library construction 

Standard and normalized cDNA libraries were prepared 
for each developmental stage. cDNA was synthesized 
from 1 ug poly A + RNA using the SMART cDNA con- 
struction kit (Clontech Laboratories, Mountain View, 
CA). SMART-amplified cDNA was normalized using 
the Trimmer-Direct cDNA normalization kit (Evrogen, 
Moscow, Russia). S/H-digested cDNA was purified using 
the QIAquick PCR purification kit (Qiagen, Mississauga, 
ON). cDNA was size fractionated on a 1% agarose gel 
run at 45 V until the bromophenol blue dye was 2 cm 
away from the well, and the band was excised in two 
fractions corresponding to 0.5 to 1.5 kb and greater 
than 1.5 kb. cDNA was extracted using the QIAquick 
gel extraction kit (Qiagen), and the two fractions were 
ligated separately to a modified pBluescript II KS+ (Agi- 
lent) containing Sfil cloning sites. Ligation reactions 
were purified using the MinElute reaction cleanup kit 
(Qiagen). The ligation product was transformed into 
ElectroMAX DH10B Tl -phage resistant competent cells 
(Invitrogen, Burlington, ON). 

EST sequencing and analysis 

Culture plates (384-well) were inoculated with a Norgren 
Systems CP7200 colony picker (Ronceverte, WV). Plas- 
mids were amplified using TempliPhi (GE Healthcare 
Life Sciences, Baie d'Urfe, QC) and cycle sequenced 
using Applied Biosystems BigDye chemistry and a BioR- 
APTR FRD microfluidic workstation (Beckman Coulter 
Canada, Mississauga, ON) for amplification and sequen- 
cing reaction set-up. Completed 384-well sequencing 
plates were processed with Applied Biosystems 3730x1 
DNA analyzers. Most clones were sequenced from the 5' 
end using M13 Reverse primer; a small number were also 
sequenced with the M13-20 primer. EST processing 
involved quality-trimming, vector-masking, low-com- 
plexity masking, and poly-A trimming using custom perl 
scripts. Assembly was performed with TGICL [65] and 
the results stored in the FIESTA-2 database at the Plant 
Biotechnology Institute of the National Research Council 
of Canada. A total of 30,147 ESTs were assembled into 
3,658 contigs and 6,027 singletons. Annotation was done 
by BLASTx against UniProt Plants version 15 where 
accessions with uninformative annotations such as 
"unknown protein", "putative predicted protein", "shot- 
gun sequence from scaffold", etc. had been removed 
(June 19, 2009) [66]. GO annotations were transferred 
from the best BLASTx hit to TAIR version 8 (June 8, 
2009) if the e value was smaller than or equal to l" 10 . 
Contigs and singletons reported in Figure 5 were verified 
for the absence of cloning or sequencing artifacts. 



RACE and cDNA cloning 

Total RNA from SMARC1N-PN1 (20 mg seed weight) 
was digested with amplification grade DNase I (Invitro- 
gen). First strand cDNA was synthesized using Thermo- 
Script RT-PCR system from 1 ug RNA (Invitrogen). 5' 
RNA ligase mediated RACE was performed with 
Ambion FirstChoice RLM-RACE Kit (Applied Biosys- 
tems) using Taq DNA polymerase (New England Bio- 
labs, Pickering, ON). 5' RACE gene specific outer (O) 
and inner (I) primers were the following: for legumin, 
Leg-O, 5' -TCCGCCACCATAAAGTTCTGT-3' and 
Leg-I, S'-CTAACTTGTCCTTGCCCCTCG-S'; and for 
albumin-2, Alb2-0, 5'-TTACATCAGGAAAGGAAT- 
CAGGC-3' and Alb2-I, 5'-TGTTTTGTGACGAGGA- 
TAAGGTG-3'. RACE products were either blunted with 
the Klenow fragment of DNA polymerase I and cloned 
into the Zero Blunt TOPO vector (Invitrogen) or cloned 
directly into the pGEM-T Easy vector (Promega, Madi- 
son, WI). Full-length cDNAs were cloned by RT-PCR 
with Pfx50 DNA polymerase (Invitrogen) into the Zero 
Blunt TOPO vector using the following primers: for 
legumin, Legumin-F, 5'-ACCAACCCATTCACCACT- 
TC-3' and Legumin-R, 5'-AAGAAAGGCTTGCTAG- 
GATGG-3'; and for albumin-2, Albumin2-F, 5'-AAG- 
CATCCTCAAATCAAATCA-3' and Albumin2-R, 5'- 
AACCAAACCACCCACTTTTA-3'. Multiple sequence 
alignment and phylogenetic analysis with PHYLIP were 
performed as previously described [63]. Triticin was 
used as outgroup. 

Protein extraction and purification 

Soluble protein was extracted from mature seed accord- 
ing to VandenBosch et al. [67], except that 50 mM Tris- 
HC1, pH 7.5 was replaced with 50 mM Tris-HCl, pH 
8.0. Protein concentration was measured with the Bio- 
Rad Protein Assay solution and bovine serum albumin 
as standard. Protein precipitated with saturated ammo- 
nium sulfate between 50 to 60% (w/v) was dissolved in 
50 mM bis-Tris-HCl, pH 6.5 containing 14 mM 2-mer- 
captoethanol and desalted on a PD-10 column (GE 
Healthcare Life Sciences). The extract was separated by 
ion exchange chromatography on a HiPrep 16/10 Q FF 
column, using an AKTApurifier system (GE Healthcare 
Life Sciences). Protein was eluted with a linear gradient 
of 0 to 0.5 M NaCl. Purified fractions of interest were 
concentrated using an Amicon Ultra- 15 centrifugal filter 
unit with Ultracel-10 membrane (Millipore, Billerica, 
MA) prior to size exclusion chromatography on a 
HiLoad Superdex 75 prep grade column (GE Healthcare 
Life Sciences) using 50 mM bis-Tris-HCl, 150 mM NaCl 
containing 14 mM 2-mercaptoethanol as buffer. Proteins 
were further chromatographed by size exclusion on a 
Superose 6 10/300 GL column. Molecular weight was 
determined from a plot of the partition coefficient, K av , 
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versus the logarithm of the molecular weight of protein 
standards. The following proteins were used for calibra- 
tion: thyroglobulin (669 kD); ferritin (440 kD); and cata- 
lase (232 kD). 

Protein identification 

To confirm protein and cDNA identity, LC-MS/MS data 
from 2-D gel spots [11] was used for de novo sequen- 
cing with PEAKS Studio v. 4.5 (Bioinformatics Solutions 
Inc., Waterloo, ON) [68] as previously described, and 
compared to conceptual cDNA translations. Protein 
fractions eluted from the HiLoad Superdex 75 prep 
grade column, and 44.0 and 42.7 kD legumin a-subunit 
present in Superose 6 10/300 GL fractions, were sepa- 
rated by SDS-PAGE on 12% polyacrylamide gels. Bands 
were excised, digested with trypsin and the resulting 
peptides were subjected to LC-MS for protein identifica- 
tion. For group 3 late embryogenesis abundant protein, 
and 44.0 and 42.7 kD legumin a-subunits, LC-MS/MS 
was performed as previously described for 2-D gel spots 
[11], except that the gradient was lengthened to 90 min. 
The 44.0 and 42.7 kD legumin a-subunits were identi- 
fied by Mascot search of the assembled EST database, to 
which the legumin cDNA sequence had been appended, 
using an in-house server [69]. For the major legumin 
bands, peptides were analyzed by LC-MS using an Alli- 
ance 2690 HPLC and a model LCT orthogonal accelera- 
tion time-of-flight mass spectrometer (Waters, 
Mississauga, ON). Samples were diluted to 100 [iL with 
100:0.1 (v/v) water-formic acid. A 50 (iL portion was 
injected into a 100 [iL loop attached to a 10 port valve 
(VICI Valco Canada, Brockville, ON) and the valve was 
switched to permit transfer to a 1 mm ID x 5 mm C18 
PepMap 100 trapping column (Dionex, Bannockburn, 
IL), using 100:0.1 (v/v) water-formic acid flowing at 0.1 
mL/min from an auxiliary high pressure pump. After 10 
min transfer/washing, the valve was switched to place 
the trapping column in line with a 1 mm ID x 150 mm 
C18 PepMap 100 analytical column (Dionex) and the 
peptides were eluted with a gradient of water-acetoni- 
trile-formic acid flowing at 30 (iL/min. An ACURATE 
flow splitter (Dionex) was used to reduce the flow of 0.3 
mL/min from the HPLC to this level. Solvent A was 
90:10:0.1 (v/v/v) water-acetonitrile-formic acid and sol- 
vent B was 10:90:0.1 water-acetonitrile-formic acid. The 
gradient conditions were 5 min at 100% A, a linear 
increase to 40% B from 5 to 35 min, a 5 min hold at 
40% B followed by a return to 100% A at 45 min and a 
15 min equilibration time. The column effluent was 
transferred to a Megaflow electrospray probe of the 
mass spectrometer where the components were ionized 
and subsequently analyzed. The ion source was operated 
in positive mode with a capillary voltage of 3 kV with 
nitrogen as desolvation gas flowing at 250 L/h and 



heated to 250°C. Mass spectra were acquired from 85 to 
1500 m/z. The cone voltage was switched between 20 
and 50 V at 1.1 sec intervals during data acquisition to 
permit fragmentation of individual peptides under favor- 
able conditions of chromatographic separation and high 
peptide concentration. The LC-MS system was operated 
with MassLynx v. 4.0 and calibrated using a mixture of 
horse heart myoglobin and trypsinogen. Peak lists were 
generated from MassLynx raw data files using Mascot 
Distiller v. 2.2.1 (Matrix Science, Boston, MA). Peptide 
fragment data for each sample was processed by exam- 
ining chromatographic peaks for the purity of their 20 V 
spectra, combining 50 V spectra across a suitable peak 
as representative of fragments of a single peptide and 
saving the data as a mass vs. intensity text file. These 
text files for the peptides in a sample were manually 
combined into a text file in .pkl format representative of 
the sample. Peptide mass fingerprint data was analyzed 
against the conceptual translation using PAWS (Digilab, 
Holliston, MA) and peptide sequence tags derived with 
Peaks Studio. 

Statistical analysis 

7<T-means and hierarchical clustering was performed with 
GeneSpring GX v. 11.5 (Agilent). 

Accession numbers 

Nucleotide sequence data from this article has been 
deposited in the GenBank database under accession 
numbers [GenBank ID: GW884178 to GW914324] for 
seed ESTs; [GenBank ID: HM240256] for legumin; 
[GenBank ID: HM240257] for albumin-2; [GenBank ID: 
HM240258] for defensin Dl; [GenBank ID: HM240259] 
for defensin D2; [GenBank ID: HM240260] for albumin- 
1A; [GenBank ID: HM240261] for albumin-IB; [Gen- 
Bank ID: HM240262] for albumin-ID; [GenBank ID: 
HM240263] for albumin- IF; [GenBank ID: HM240264] 
for albumin- IE; [GenBank ID: HM240265] for albumin- 
1C; and [GenBank ID: HM240266] for albumin-lG. 
Additional amino acid sequence data can be found 
under accession numbers [UniProt ID: Q41115] for a- 
phaseolin; [UniProt ID: Q43632] for phaseolin encoded 
by Phs; [UniProt ID: Q8RVX5] for lectin encoded by 
Iec4-B17; [UniProt ID: Q8RVX6] for phytohemagglutinin 
encoded by pha-E; [UniProt ID: P15231] for leucoagglu- 
tinating phytohemagglutinin encoded by PDLEC2; [Uni- 
Prot ID: Q43628] for phytohemagglutinin; [UniProt ID: 
Q6J2U4] for a-amylase inhibitor-1; [UniProt ID: 
Q9SMH0] for a-amylase inhibitor-like protein; [UniProt 
ID: Q9SB11] for glycinin A5A4B3 (G. max); [UniProt 
ID: Q647H1] for arachin 5; [UniProt ID: B5U8K1] for 
legumin storage protein 2; [UniProt ID: B5U8K2] for 
legumin storage protein 3; [UniProt ID: Q39921] for gly- 
cinin A5A4B3 (G. soja); [UniProt ID: A3KEY8] for 
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glycinin A3B4 (G. soja); [UniProt ID: P04347] for glyci- 
nin A3B4 (G. max); [UniProt ID: Q3HW60] for glycinin 
G7; [UniProt ID: 024294] for minor small legumin; 
[UniProt ID: Q43673] for legumin-related high molecu- 
lar weight polypeptide; [UniProt ID: P05190] for legu- 
min B (V.faba); [UniProt ID: Q41703] for legumin B 
(V. sativa); [UniProt ID: P05692] for legumin J; [UniProt 
ID: P11828] for glycinin G3; [UniProt ID: P04776] for 
glycinin Gl; [UniProt ID: P04405] for glycinin G2; [Uni- 
Prot ID: Q0GM57] for iso-Arah3; [UniProt ID: Q5I6T2] 
for arachin Ahy-4; [UniProt ID: Q647H2] for arachin 
Ahy-3; [UniProt ID: B5U8K6] for legumin storage pro- 
tein 5; [UniProt ID: Q9T0P5] for legumin A (P. sati- 
vum); [UniProt ID: Q41702] for legumin A (V. sativa); 
[UniProt ID: Q99304] for legumin A2; [UniProt ID: 
Q9SMJ4] for legumin (C. arietinum); [UniProt ID: 
Q53I54] for legumin-like protein; [UniProt ID: 
B2CGM6] for triticin; [UniProt ID: Q43680] for seed 
albumin (V. radiata); [UniProt ID: P08688] for albumin- 
2 (P. sativum); [UniProt ID: Q8W434] for defensin D2 
(V. radiata); [UniProt ID: C1K3M7] for defensin (V, 
unguiculata); [UniProt ID: Q7XZC2] for albumin- 1 (P. 
vulgaris); [UniProt ID: Q39837] for albumin- 1 (G max). 
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